58 research outputs found
On combining information-theoretic and cryptographic approaches to network coding security against the pollution attack
In this paper we consider the pollution attack in network coded systems where network nodes are computationally limited. We consider the combined use of cryptographic signature based security and information theoretic network error correction and propose a fountain-like network error correction code construction suitable for this purpose
3D ShapeNets: A Deep Representation for Volumetric Shapes
3D shape is a crucial but heavily underutilized cue in today's computer
vision systems, mostly due to the lack of a good generic shape representation.
With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft
Kinect), it is becoming increasingly important to have a powerful 3D shape
representation in the loop. Apart from category recognition, recovering full 3D
shapes from view-based 2.5D depth maps is also a critical part of visual
understanding. To this end, we propose to represent a geometric 3D shape as a
probability distribution of binary variables on a 3D voxel grid, using a
Convolutional Deep Belief Network. Our model, 3D ShapeNets, learns the
distribution of complex 3D shapes across different object categories and
arbitrary poses from raw CAD data, and discovers hierarchical compositional
part representations automatically. It naturally supports joint object
recognition and shape completion from 2.5D depth maps, and it enables active
object recognition through view planning. To train our 3D deep learning model,
we construct ModelNet -- a large-scale 3D CAD model dataset. Extensive
experiments show that our 3D deep representation enables significant
performance improvement over the-state-of-the-arts in a variety of tasks.Comment: to be appeared in CVPR 201
Understanding and Predicting Image Memorability at a Large Scale
Progress in estimating visual memorability has been limited by the small scale and lack of variety of benchmark data. Here, we introduce a novel experimental procedure to objectively measure human memory, allowing us to build LaMem, the largest annotated image memorability dataset to date (containing 60,000 images from diverse sources). Using Convolutional Neural Networks (CNNs), we show that fine-tuned deep features outperform all other features by a large margin, reaching a rank correlation of 0.64, near human consistency (0.68). Analysis of the responses of the high-level CNN layers shows which objects and regions are positively, and negatively, correlated with memorability, allowing us to create memorability maps for each image and provide a concrete method to perform image memorability manipulation. This work demonstrates that one can now robustly estimate the memorability of images from many different classes, positioning memorability and deep memorability features as prime candidates to estimate the utility of information for cognitive systems. Our model and data are available at: http://memorability.csail.mit.edu.National Science Foundation (U.S.) (Grant 1532591)McGovern Institute for Brain Research at MIT. Neurotechnology (MINT) ProgramMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory. MIT Big Data InitiativeGoogle (Firm)Xerox Corporatio
Object Detectors Emerge in Deep Scene CNNs
With the success of new computational architectures for visual processing,
such as convolutional neural networks (CNN) and access to image databases with
millions of labeled examples (e.g., ImageNet, Places), the state of the art in
computer vision is advancing rapidly. One important factor for continued
progress is to understand the representations that are learned by the inner
layers of these deep architectures. Here we show that object detectors emerge
from training CNNs to perform scene classification. As scenes are composed of
objects, the CNN for scene classification automatically discovers meaningful
objects detectors, representative of the learned scene categories. With object
detectors emerging as a result of learning to recognize scenes, our work
demonstrates that the same network can perform both scene recognition and
object localization in a single forward-pass, without ever having been
explicitly taught the notion of objects.Comment: 12 pages, ICLR 2015 conference pape
ImageNet Large Scale Visual Recognition Challenge
The ImageNet Large Scale Visual Recognition Challenge is a benchmark in
object category classification and detection on hundreds of object categories
and millions of images. The challenge has been run annually from 2010 to
present, attracting participation from more than fifty institutions.
This paper describes the creation of this benchmark dataset and the advances
in object recognition that have been possible as a result. We discuss the
challenges of collecting large-scale ground truth annotation, highlight key
breakthroughs in categorical object recognition, provide a detailed analysis of
the current state of the field of large-scale image classification and object
detection, and compare the state-of-the-art computer vision accuracy with human
accuracy. We conclude with lessons learned in the five years of the challenge,
and propose future directions and improvements.Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL
VOC (per-category comparisons in Table 3, distribution of localization
difficulty in Fig 16), a list of queries used for obtaining object detection
images (Appendix C), and some additional reference
Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer
Importance Application of deep learning algorithms to whole-slide pathology images can potentially improve diagnostic accuracy and efficiency.
Objective Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin–stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists’ diagnoses in a diagnostic setting.
Design, Setting, and Participants Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC).
Exposures Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation.
Main Outcomes and Measures The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor.
Results The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P < .001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95% CI, 0.927-0.998] for the pathologist WOTC).
Conclusions and Relevance In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting
Undoing the Damage of Dataset Bias
The presence of bias in existing object recognition datasets is now well-known in the computer vision community. While it remains in question whether creating an unbiased dataset is possible given limited resources, in this work we propose a discriminative framework that directly exploits dataset bias during training. In particular, our model learns two sets of weights: (1) bias vectors associated with each individual dataset, and (2) visual world weights that are common to all datasets, which are learned by undoing the associated bias from each dataset. The visual world weights are expected to be our best possible approximation to the object model trained on an unbiased dataset, and thus tend to have good generalization ability. We demonstrate the effectiveness of our model by applying the learned weights to a novel, unseen dataset, and report superior results for both classification and detection tasks compared to a classical SVM that does not account for the presence of bias. Overall, we find that it is beneficial to explicitly account for bias when combining multiple datasets. Keywords: Target Domain; Domain Adaptation; Transfer Learning; Visual World; Spatial Pyrami
Predicting human behavior using visual media
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.Cataloged from PDF version of thesis.Includes bibliographical references (pages 161-173).The ability to predict human behavior has applications in many domains ranging from advertising to education to medicine. In this thesis, I focus on the use of visual media such as images and videos to predict human behavior. Can we predict what images people remember or forget? Can we predict the type of images people will like? Can we use a photograph of someone to determine their state of mind? These are some of the questions I tackle in this thesis. Through my work, I demonstrate: (1) It is possible to predict with near human-level correlation, the probability with which people will remember images, (2) it is possible to predictably modify the extent to which a face photograph is remembered, (3) it is possible to predict, with a high correlation, the number of views an image will receive even before it is uploaded, (4) it is possible to accurately identify the gaze of people in images, both from the perspective of a device, and third-person. Further, I develop techniques to visualize and understand machine learning algorithms that could help humans better understand themselves through the analysis of algorithms capable of predicting behavior. Overall, I demonstrate that visual media is a rich resource for the prediction of human behavior.by Aditya Khosla.Ph. D
- …